Optimal Rewards versus Leaf-Evaluation Heuristics in Planning Agents
نویسندگان
چکیده
Planning agents often lack the computational resources needed to build full planning trees for their environments. Agent designers commonly overcome this finite-horizon approximation by applying an evaluation function at the leaf-states of the planning tree. Recent work has proposed an alternative approach for overcoming computational constraints on agent design: modify the reward function. In this work, we compare this reward design approach to the common leaf-evaluation heuristic approach for improving planning agents. We show that in many agents, the reward design approach strictly subsumes the leaf-evaluation approach, i.e., there exists a reward function for every leaf-evaluation heuristic that leads to equivalent behavior, but the converse is not true. We demonstrate that this generality leads to improved performance when an agent makes approximations in addition to the finite-horizon approximation. As part of our contribution, we extend PGRD, an online reward design algorithm, to develop reward design algorithms for Sparse Sampling and UCT, two algorithms capable of planning in large state spaces.
منابع مشابه
SCALABLE PLANNING UNDER UNCERTAINTY by
Autonomous agents that act in the real-world can often improve their success by capturing the uncertainty that arises because of their imperfect knowledge and potentially faulty actions. By making plans robust to uncertainty, agents can be prepared to counteract plan failure or act upon information that becomes available during plan execution. Such robust plans are valuable, but are often diffi...
متن کاملLogical Encodings With No Time Indexes for Defining and Computing Admissible Heuristics for Planning
A limitation of the SAT approach to planning and the more recent Weighted-SAT approach to planning with preferences is the use of logical encodings where every fluent and action must be tagged with a time index. The result is that the complexity of the encodings grows exponentially with the planning horizon, and for metrics other than makespan, the optimality achieved is conditional on the plan...
متن کاملCost-Optimal Planning with Landmarks
Planning landmarks are facts that must be true at some point in every solution plan. Previous work has very successfully exploited planning landmarks in satisficing (non-optimal) planning. We propose a methodology for deriving admissible heuristic estimates for cost-optimal planning from a set of planning landmarks. The resulting heuristics fall into a novel class of multi-path dependent heuris...
متن کاملPAC optimal MDP planning with application to invasive species management
In a simulator-defined MDP, the Markovian dynamics and rewards are provided in the form of a simulator from which samples can be drawn. This paper studies MDP planning algorithms that attempt to minimize the number of simulator calls before terminating and outputting a policy that is approximately optimal with high probability. The paper introduces two heuristics for efficient exploration and a...
متن کاملSensible Agent Technology Improving Coordination and Communication in Biosurveillance Domains
Planning landmarks are facts that must be true at some point in every solution plan. Previous work has very successfully exploited planning landmarks in satisficing (non-optimal) planning. We propose a methodology for deriving admissible heuristic estimates for cost-optimal planning from a set of planning landmarks. The resulting heuristics fall into a novel class of multi-path dependent heuris...
متن کامل